Statistical learning: basics, R, and linear regression

MACS 30100
University of Chicago

February 8, 2017

What is statistical learning?

Functional form

\[Y = f(X) + \epsilon\]

  • Statistical learning refers to the set of approaches for estimating \(f\)

Linear functional form

Why estimate \(f\)?

  • Prediction
  • Inference
  • How do we estimate \(f\)?
    • Parametric methods
    • Non-parametric methods

Parametric methods

  1. First make an assumption about the functional form of \(f\)
  2. After a model has been selected, fit or train the model using the actual data

OLS

Parametric methods

\[Y = \beta_0 + \beta_{1}X_1\]

  • \(Y =\) sales
  • \(X_{1} =\) advertising spending in a given medium
  • \(\beta_0 =\) intercept
  • \(\beta_1 =\) slope

Non-parametric methods

  • No assumptions about functional form
  • Use data to estimate \(f\) directly
    • Get close to data points
    • Avoid overcomplexity
  • Requires large amount of observations

LOESS

Types of learning

  • Supervised
  • Unsupervised

Statistical learning vs. machine learning

  • Statistical learning
    • Subfield of statistics
    • Focused predominantly on inference
  • Machine learning
    • Subfield of computer science
    • Focused predominantly on prediction

Why R?

Popularity

Why R?

Why R?

Things R does well

  • Statistical analysis
  • Data visualization

Things R does not do as well

  • Speed

Why are we not using Python?

Resources for learning R

But I don’t wanna!

Caveat emptor

Acknowledgments